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ABSTRACT 

Privacy-preserving techniques for distributed computation 
have been proposed recently as a promising framework in 
collaborative inter-domain network monitoring. Several dif- 
ferent approaches exist to solve such class of problems, e.g., 
Homomorphic Encryption (HE) and Secure Multiparty Com- 
putation (SMC) based on Shamir's Secret Sharing algorithm 
(SSS). Such techniques are complete from a computation- 
theoretic perspective: given a set of private inputs, it is 
possible to perform arbitrary computation tasks without re- 
vealing any of the intermediate results. In fact, HE and SSS 
can operate also on secret inputs and/or provide secret out- 
puts. However, they are computationally expensive and do 
not scale well in the number of players and/or in the rate 
of computation tasks. In this paper we advocate the use of 
"elementary" (as opposite to "complete") Secure Multiparty 
Computation (E-SMC) procedures for traffic monitoring. E- 
SMC supports only simple computations with private input 
and public output, i.e., it can not handle secret input nor 
secret (intermediate) output. Such a simplification brings 
a dramatic reduction in complexity and enables massive- 
scale implementation with acceptable delay and overhead. 
Notwithstanding its simplicity, we claim that an E-SMC 
scheme is sufficient to perform a great variety of compu- 
tation tasks of practical relevance to collaborative network 
monitoring, including, e.g., anonymous publishing and set 
operations. This is achieved by combining a E-SMC scheme 
with data structures like Bloom Filters and bitmap strings. 

1. INTRODUCTION 

Privacy-preserving techniques for distributed computation 
have been proposed recently as a promising tool in collab- 
orative inter-domain network monitoring — see, e.g., the 
motivating paper by Roughan and Zhang In the refer- 
ence scenario, a set of ISPs are unwilling to share local traffic 
data due to business sensitivity and/or concerns about their 
users' privacy. On the other hand, they have a collective 
interest to perform some global computation on such data 
and share the final result. For example, they might want 
to aggregate local traffic measurements in order to recon- 
struct global statistics, and these might be further processed 
in order to unveal global threats (e.g., botnets) or discover 
macroscopic anomalies. As pointed out already in iSj, each 
ISP would benefit from comparing its own local view (of traf- 



fic conditions) with the global view aggregated over all other 
ISPs, especially in the occasion of anomalies and alarms, in 
order to hint at whether the (unknown) root cause is lo- 
cal or global — a major discriminator for deciding about the 
reaction. Also, ISPs might be ready to share with other 
ISPs information about security incidents observed locally, 
provided that they can do so anonymously. 

Two possible approaches to solve such class of problems 
are Homomorphic Encryption (HE) and Secure Multiparty 
Computation (SMC) based on Shamir's Secret Sharing algo- 
rithm (SSS for short). Both these techniques are "complete" 
from a computation-theoretic perspectiv^ given a set of 
private inputs, it is possible, in principle, to compute any ar- 
bitrary function, including structured algorithms involving 
conditional statements, without revealing any of the inter- 
mediate results. In fact, a distinguishing feature of HE and 
SSS is that they can operate also on secret inputs and/or 
provide secret outputs (see the graphical representation in 
Fig. 



1(a) 



The notions of secret and private are distinct: 
private data is known in cleartext to at least one player (and 
usually only to one), while secret data remains unknown by 
all players and can not be reconstructed unless a minimum 
number of players agree to do so. On the other hand, such 
techniques are computationally expensive — especially HE 

— and therefore do not scale well in the rate of computation 
tasks (queries) and/or in the number of players. 

In this paper we advocate the use of "elementary" (as op- 
posite to "complete") SMC procedures for collaborative traf- 
fic monitoring. Such techniques — hereafter referred to as 
E-SMC for short — have a fundamental limit: they support 
only simple computations with private inputs and public out- 
put, i.e., they can not handle secret input nor secret (inter- 
mediate) output. We show that such a simplification allows 
for an enormous reduction in computational complexity and 
overhead, making such techniques amenable to massive-scale 
implementation. Notwithstanding its simplicity, we claim 
that E-SMC is sufficient to perform a broad variety of tasks 
of practical importance in the field of collaborative traffic 
monitoring. In fact, queries can be chained to build more 
structured computation tasks (ref. Fig. |l(b)| ) whenever in- 
termediate results — which are necessarily public in E-SMC 

— are not regarded as sensitive. Moreover, we show that an 
additive E-SMC scheme can be combined with local trans- 
formations on the private data and/or with particular data 
structures (e.g.. Bloom Filters, bitmap strings) in order to 
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^A fully homomorphic, computationally complete HE 
scheme has been introduced recently by Gentry [12]. The 
completeness of SSS is shown in 3 . 



extend the range of supported operations. 

In this work we take a first step towards unfolding the 
potential of E-SMC for traffic monitoring. We make three 
main contributions. First, we present a simple scheme for 
E-SMC, called GCR, which is based on additive-only or 
multiplicative-only secret computation and extends an idea 
presented earlier in 2 . Second, we highlight some system- 
design aspects of GCR that enable massive-scale implemen- 
tation: in particular, we propose to split the computation 
into offline randomization and online aggregation phases. 
Third, we describe how GCR can support a number of oper- 
ations relevant to collaborative traffic monitoring — like set 
operations, anonymous publishing and anonymous schedul- 
ing — when combined with data structures like Bloom Fil- 
ters and bitmap strings. 

The aim of this report is not to provide definitive results 
nor quantitative assessments, but rather to indicate a direc- 
tion of work to researchers engaged in inter-domain traffic 
monitoring. We claim that a broad variety of tasks of practi- 
cal relevance to this field do not necessitate to resort to "com- 
plete" (and complex) privacy-preserving schemes but can be 
satisfactorily attained by E-SMC. Thanks to their simplic- 
ity, collaborative systems based on E-SMC are amenable 
to massive-scale implementation, with very large number of 
players and/or very high rate of queries. In turn, system 
scalability paves the way towards customer- driven collabo- 
rative monitoring, where participating players do not map to 
ISPs but rather to their customers — think, e.g., to mid-to- 
large companies with own IT security staff. This is indeed a 
new avenue of collaborative network monitoring that might 
have in E-SMC its enabling technology. 

2. THE GCR METHOD 

We consider the classical SMC scenario where a set of n 
players collaborate to compute a function of some private 
data — e.g., traffic statistics, network logs, records of se- 
curity incidents. As customary in SMC, we assume a semi- 
honest model (also known as honest-hut- curious): all players 
cooperate honestly to compute the final result, but a subset 
of them might collude to infer private information of other 
players. In other words, no malicious player will attempt 
to interrupt nor corrupt the computation process, e.g., by 
providing incorrect input data. 

In this section we present a simple method to perform 
secure private addition which extends an idea presented ear- 
lier by Atallah et al. in 2, §4.1] based on additive secret 
sharing. We refer to our method as "Globally-Constrained 
Randomization", GCR for short. We show that GCR, which 
is simple conceptually, lends itself very well to massive-scale 
implementation. We propose also for the first time a varia- 
tion of the scheme to perform secure multiplication. 

2.1 Notation 

We consider a set of n players {P^, i — 1 . . .n} with n > 3 
(normally n » 1). The maximum number of colluding 
players will be denoted by / (collusion threshold) with / < 
n — 2. Note that / is a design parameter that can be set 
independently from the system size n. For each computation 
task (query) each player Pi involves two elements: 

• ai is the private input of Pi to the summation. For 
some queries, it is obtained by applying a local trans- 



formation ^0 on some other inner private data 6^, i.e., 

• ri is the private random element which Pi has previ- 
ously generated cooperatively with other players in the 
way presented later. 

• Vi =^ ai + ri is the public input which Pi eventually 
announces to the other players. 

The collection of random elements across all players con- 
stitutes a Random Set (RS) and will be denoted by r =^ 
{ri, i = l...n}. The goal of the computation round is 

to obtain the public output result A =^ /(ai, a2 . . . an) = 
f{g{bi),g{b2),..g{bn)) without disclosing the values of the 
individual a^'s. For each computation, all input elements 
(ai^ri^Vi) and the output A must be in the same format. 
For the additive scheme they must be defined over the same 
additive commutative group (Abelian group). We will con- 
sider the following distinct cases: 

Real scalar s: ai,ri and A are real numbers defined in the 
interval Rp =^ [0,p]. For the sake of simplicity we 
will assume p integer, but not necessarily prime. The 
group operation in this case is modulo-p addition. A 
generic random element x is a random value extracted 
uniformly in [0,p], i.e., x ^ U{0,p). The null element 
is the zero value. 

Integer scalars: this is a sub-case of the previous one, where 
ai,ri and A are integers in Zp =^ [0,p]. Unless differ- 
ently specified, p is not necessarily a prime number. In 
practice, it is convenient to choose p = 2^ {q integer) 
so that modulo-p addition maps to wrap-around of a 
^-bit counter. 

Binary strings: ai,n and A are binary strings of length 
k. The group operation is therefore bitwise addition 
(XORing) . In this context a generic random element x 
is a random string, i.e., a collection of bits set randomly 
to 1 or independently and with equal probabilities. 
The null element is a string with all 'O's. 

Arrays of counters: a^, ri and A are vectors of k elements, 
and each element is a bit counter. The group op- 
eration is therefore an array of k parallel modulo— p 
additions. In this context a generic random element x 
is a collection of k random values < xi,X2, ..Xk > ex- 
tracted independently and uniformly in [0,p — 1]. The 
null element is an array of zeros. 

The format of the input elements , , the exact values 
of the parameters (e.g., k,q) and, if applicable, the choice 
of the transformation function g() depend on the particular 
kind of operation (query) as detailed in ^ In the following 
we will use the symbol '+' to refer generically to the addi- 
tion between two terms and 'J]]' for multiple terms, without 
specifying the group operation. 

2.2 Description 

The central aspect of the GCR method is that RS is con- 
structed in a way that guarantees the zero-sum condition, 
i.e., the composition of random elements across all users 
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Figure 1: Graphical representation of a "complete" secure procedure with secret intermediate results (a) 
and a sequence of "elementary" secure operations chained by public intermediate results (b). 



sums up to the null element: 



(1) 



Moreover, the generation of RS ensures that the individual 
TiS can not be inferred by other players — provided that 
the number of colluding players remains below the colluding 
threshold /. Each player P^ then shares with other players 
(e.g., via a central collector) the sum of data plus random 
elements, i.e., Vi — ai+Vi^ which serves as the public input to 
the computation. When all input elements Vi are collected, 
the value of A is obtained by summing them all, formally: 

n n n n 

^Vi^^{cii + ri) = ^ai + ^ri = A + {) = A (2) 

i—l i—1 i — 1 i—1 

Note that the value of A can be reconstructed only when the 
inputs from all players have been collected: it is sufficient 
that a single player (among those that have contributed to 
generate the RS r) fails to provide its input element to pre- 
vent the computation of A. This is the main disadvantage 
of GCR compared to SSS, as discussed later in §3.3| 



RS generation Hereafter we describe how each generic 
player (i = 1 . . . n) constructs its random element in 
cooperation with other players, so as to collectively build the 
RS r. Note that the RS generation procedure can be run 
in parallel by all players and is completely asynchronous. 
Each random element is initially set to the null element, 
i.e., ri = 0. Each player Pi extracts / + 1 random variables 
Xij {j — 1 . . . I -\- 1) and computes their sum yi =^ Xij. 
It calculates the additive invers^ yi of yi and adds it to 
its own random element, i.e., ^ + yi. At the same 
time. Pi contacts / + 1 randomly selected other players and 
sends one variable Xij to each of them: each contacted 
player Pj will then increment its random element by Xij, 
i.e., rj ^ Tj + Xi^j. This method is secure against collusion 
of up to / players. Notably, the value of / is a free parameter, 
independent from the system size n, which can be tuned to 



^In modular arithmetic the additive inverse y oi y \^ the 
element that satisfies y + y = 0. For real numbers in [0,p], 
y — p — y -\- 1^ while for binary strings y — y. 



trade-off communication overhead with robustness to collu- 
sion — both scale linearly in /. 

Computation phase. With GCR the computation is 
basically a summation over n public inputs, the t'i's, and 
no particular constraint applies to the aggregation method 
which can be centralized or distributed. For the sake of sim- 
plicity, we assume in the following a fully centralized scheme, 
with a single master — not necessarily a player — that is 
in charge of launching the query, collecting the n public in- 
puts, computing the result and finally publishing it to all the 
players. Another possible option is tree-based aggregation: 
players are arranged into a tree, where each node collects the 
inputs from its children and sends the summation result to 
its parent node, until the root computes and publishes the 
final result. More sophisticated peer-to-peer method can 
also be adopted at the cost of some additional coordination 
overhead. The point to be taken is that the GCR method is 
oblivious to the particular input aggregation scheme. 

2.3 Extension to multiplication 

It is straightforward to adapt the GCR scheme to support 
multiplication of positive integers. First, the input and out- 
put data ai , Vi and A must be defined over the multiplicative 
group with p a prime number: primality guarantees 

that each element has a unique multiplicative inverse ele- 
ment (note the difference with additive GCR which does not 
require primality of p). Second, all modulo-p additions are 
replaced by modulo-p multiplications. Third, the balancing 
constraint eq. ([T]) is replaced by: 



(3) 



In this way we obtain a multiplicative variant of the addi- 
tive sharing scheme, which to the best of our knowledge was 
never considered in previous literature. It is important to 
remark that GCR can support either addition or multipli- 
cation, but it can not compose addition and multiplication 
operations without reconstructing and resharing values. In 
the secret evaluation scheme, it is therefore not computa- 
tionally complete. 

Finally, note that multiplicative GCR can not take zero as 
private input, as that would automatically force to zero also 



the public output, i.e., ai = ^ Vi = ai ■ ri = \/ri, there- 
fore leaking the private value. In practice, before launching 
a secret multiplication, one can easily check for the presence 
of zero inputs, e.g., with a preliminary round of Conditional 
Counting (see p.2| ). 

2.4 Sensitivity of Output 

It is important to note that SMC in general (not only E- 
SMC) only guarantees that no information is leaked from the 
computation process. That is, it solves the problem of how 
to compute a function /() on distributed data in a privacy- 
preserving way. An orthogonal problem is to find out what is 
safe to compute. Just learning the resulting value /() could 
allow the inference of sensitive information. For example, 
if the private input bits must remain secret, computing the 
logical AND of all input bits is insecure in itself: if the fi- 
nal result was 1, all input bits must be 1 as well and are 
thus no longer secret. In SMC, it is the responsibility of 
the input providers to verify that learning /() is acceptable, 
in the same way as they have to verify this when using a 
trusted third party. While with SMC, this analysis has to 
be performed for the final result only, in E-SMC it has to 
be performed individually for each step computing public 
intermediate results. 

A recently suggested approach to deal with this is dif- 
ferential privacy (To][T5], which systematically randomizes 
answers to database queries to prevent inference of sensi- 
tive input data. If data records are independent, it guaran- 
tees that it is statistically impossible to infer the presence 
or absence of single records in the database from answers 
to queries. Differential privacy and SMC complement each 
other very well. Using differential privacy, it is possible to 
specify a randomized output /() that is safe for public re- 
lease. Using SMC, it is possible to actually compute /() in 
a privacy-preserving manner, without relying on a trusted 
third party. Intuitively, the stronger /() aggregates input 
data, the less randomness needs to be added. 

3. SYSTEM-DESIGN CONSIDERATIONS 

In this section we consider a number of system-level as- 
pects. In particular, we propose to split the GCR opera- 
tion into an offline generation of RS and online aggrega- 
tion phase, and show how joins and leaves of nodes can be 
handled efficiently. We also compare the GCR scheme to 
Shamir's secret sharing scheme, which, among the existing 
alternatives for performing SMC, allows the most efficient 
solutions. 

3.1 Offline generation of Random Sets 

One key advantage of GCR is that the process of gen- 
erating the RS is completely decoupled — and can be run 
independently — from the actual computation round. This 
has important implications for the design of a massive-scale 
system, enabling efficient management of the communica- 
tion load and minimal response delay. We devise a system 
where lists of RS are generated offline and stored for later 
use. At any time, each player Pi has available a collection of 
random elements ri[u], indexed in which can be readily 
used for future computation rounds. The communication 
protocol must ensure that the RS indexing is univocal and 
synchronized across all players. During the online computa- 
tion phase, the query command broadcasted by the central 



master will indicate explicitly the RS index to be used for 
the production of the public inputs Vi^s. 

Performing RS generation offiine brings several advan- 
tages. First, it minimizes the query response delay down 
to the same value of an equivalent cleartext summation. 
Second, it allows to reduce the impact of communication 
overhead onto the network load by scheduling the RS gener- 
ation process in periods of low network load (e.g., at night 
or week-end). Moreover, generation of multiple RS can be 
batched, meaning that in a single secure connection (typ- 
ically SSL over TCP) two players can exchange multiple 
< variable, index > pairs {xij[u],u} which collectively build 
a collection of RS {r[i/^]}. This greatly reduces the com- 
munication overhead associated to connection establishment 
(handshaking, authentication, key exchange, etc.). 

3.2 Joining and leaving 

In the GCR scheme, the set of players participating in the 
computation round must match exactly the set of players 
that have previously built the RS: the final result will not be 
reconstructed if the two sets differ by even a single element. 
If RSs are generated offiine, the set of players might have 
changed during the interval between the generation of r[u] 
and its consumption in a query. It would be very impractical 
to trash all pre-computed RSs upon every new player joining 
or leaving the system — an event not infrequent for systems 
with many players. Fortunately this is not necessary and 
each legacy RS can be incrementally adjusted upon new join 
or leave with only / + 1 operations. 

When a new player Pi joins the system, it learns from 
other players the index range currently in use {ui ...U2} 
(note this information is public) and computes a set of ran- 
dom variables Xij [u] for j = 1 . . . / + 1 and u G {ui . . .U2}. 
It then sets its local random elements as ri[u] = yi[u] (re- 
call that yi = X^jii^*,jM)- Then for each index value k 
it selects / + 1 other players to which it sends the individ- 
ual variables ]. Similarly, when an existing player Pi 
wants to leave the system, it must first "release" its random 
elements ri[u]. The simplest way to accomplish that is to 
simply pass the value of ri[u] to another randomly selected 
player Pj and let the latter update its local random ele- 
ment as rj [u] ^ rj [u] + ri [u] . Note that we are assuming a 
"cooperative leaving" behavior: players release their unused 
random elements to the system before leaving. However if 
a player shuts down without releasing its random elements 
— e.g., due to failure, power off or disconnection — all RSs 
in the entire system are invalidated and become useless. In 
large scale systems such events might not be infrequent, and 
proper countermeasures must be adopted to minimize their 
impact (e.g., node redundancy). 

3.3 GCR versus Shamir's Scheme 

We now compare GCR to Shamir's secret sharing scheme 20 , 
denoted by SSS. E-SMC, along with all the use cases de- 
scribed in the following sections can be implemented with 
either GCR or SSS. In GCR, reconstruction of public values 
is implicitly done after each processing step, while in SSS 
reconstruction needs to be scheduled explicitly if desired. 

In SSS, a secret value s is shared among a set of n players 
by generating a random polynomial / of degree t < n over 
a prime field Zp, such that /(O) = s. Each player i = 1 . . . n 
then receives an evaluation point Si = /(^), called the share 



of player i. The secret s can be reconstructed from any t + 1 
shares using Lagrange interpolation but is completely unde- 
fined for t or less shares. Because SSS is linear, addition of 
two shared secrets can be computed by having each player 
locally add his shares of the two values. Multiplication of 
two shared secrets requires an extra round of communica- 
tion to guarantee randomness and to correct the degree of 
the new polynomial [ll]. Thus, a distributed multiplication 
requires a synchronization round with total messages. 
For multiplications to work, the degree must be such that 
n > 2t + 1. 

There are two main advantages of SSS over GCR . First, 
the basic operations for addition and multiplication accept 
public, private, and also secret input data and output secret 
data. That is, even without reconstructing intermediate val- 
ues, it is possible to arbitrarily compose secret operations, 
corresponding to Fig. 1(a) The GCR scheme allows com- 



position of addition and multiplication only if intermediate 
results are publicly reconstructed, because the sharing oper- 
ation to be applied (additive or multiplicative) depends on 
the next operation type. The second advantage of SSS is 
that it realizes a (t + l)-out-of-n threshold sharing scheme. 
That is, any set of t + 1 players can reconstruct a secret, 
being robust against up to n — t — 1 "missing" players. In 
GCR , a single non-responsive player renders reconstruction 
of secret information impossible. 

While E-SMC can also be implemented with SSS, GCR 
is highly optimized for online processing of queries. SSS 
requires linear storage overhead (n shares to be stored for 
each secret value), whereas GCR has constant storage over- 
head (one random value per private input). When process- 
ing the query, GCR involves zero communication overhead, 
since the players just send their randomized values instead 
of the original value to the aggregation node(s). In SSS, 
when n players want to sum up their values, each of them 
generates n shares ad-hoc and distributes them to the oth- 
ers. In principle, the players could pre-generate t random 
shares and distribute them in a pre-processing phase. In 
the online phase, they would calculate the remaining n — t 
shares using Lagrange interpolation, such that the interpo- 
lated polynomials represent their actual secrets. However, 
after distributing the last shares, each player still needs to 
perform n — 1 additions locally and for final reconstruction, 
send their shares of the sum to the aggregation node(s), 
which eventually interpolates the final polynomial. It is not 
obvious how to further split this process into a offline pre- 
processing and an online phase similar to GCR, where a 
single message and addition operation is enough. 

Another advantage of GCR is that the additive scheme is 
not restricted to prime fields. This allows to set the field size 
to 2^^ or 2^^ and therefore use implicit 32 (64) bit register 
wrap-arounds of CPU operations instead of performing an 
explicit modulo operatior[^ Furthermore, the multiplicative 
GCR scheme does not need an additional synchronization 
round like SSS. 

In summary, provided that intermediate results are not 
sensitive, GCR allows for a much smaller storage and com- 
putation overhead during the online processing phase. 

4. BASIC OPERATIONS 

^In general, mod{a,n) — a — floor{a/n), which uses an 
additional division, multiplication, and subtraction opera- 
tion. 



Here we briefiy sketch some basic operations that can be 
mapped to a secure addition with a public parameter and/or 
a public conditional statement. As such, they can be ac- 
complished directly by GCR method or any other scheme 
for secure addition. 

4.1 Summation 

The summation of positive real scalars A = ^ . ai , with 
tti G [0,p], is performed directly as explained above via 
modulo-p additions. The only significant constraint is on 
the value of p which must be greater than the total sum, 
i.e., p > A. The method can be easily extended to han- 
dle negative elements defined in [^1,^2], with di < < d2, 
by imposing a fixed shift -\-\di\ to all inputs a^'s and then 
subtracting n\di\ from the output. Note however that sum- 
mation of negative numbers is unusual in traffic monitoring. 

4.2 Conditional Counting 

We consider two versions of Conditional Counting (CC) 
queries: "player counting" and "item counting". In the first 
version, the goal is to count how many players match a pub- 
lic condition C which is explicitly announced as a public 
query argument. Each player Pi sets ai to or 1 depending 
on whether or not it matches the condition C. Therefore CC 
maps to a particular case of summation, where ai G {0, 1} 
and p > n -\- 1. In the "item counting" version instead the 
goal is to count the total number of items (e.g., hosts or 
alarm records) matching the condition C, where multiple 
items might be observed by a single player. Again, counting 
maps directly to summation of integers. 

CC queries can serve as a preliminary round to other more 
advanced queries, e.g., to identify the presence of zero in- 
puts before multiplication (see J 2.3), or to discover the ex- 
act number of active players before a round of Anonymous 
Scheduling (see J 5.4). 



4.3 Histograms and max/min discovery 

Each player Pi has a scalar private value bi and the prob- 
lem is to derive a i^-bins histogram of the distribution of the 
bi^s. This can be easily achieved by using CC queries, in- 
dexed in k, with condition C Yk-i < bi < Yk, wherein the 
threshold values {Yfc, /c = 1 . . . K} represent the bin bound- 
aries. The number of CC queries is equal to the number of 
bins K. However since bin boundaries are pre-determined, 
the queries can be batched in a single round using an array 
of K counters. 

In a similar way it is possible to discover the maximum 
value of the bi^s. Again, one can resort to a sequence of CC 
queries where the threshold values Yk are adjusted dynami- 
cally based on the previous result following a binary search. 
If bi 's are integer and upper bounded by p, the maximum is 
found in log2P rounds. Note however that the results of all 
intermediate queries are public, therefore this method dis- 
closes more information about the bi^s distribution than just 
the maximum. In a similar way it is possible to discover the 
minimum. 

5. ADVANCED OPERATIONS 

Here we show a few examples of more advanced opera- 
tions which can be mapped to E-SMC queries in combination 
with specific constraints on the input data elements and/or 
a proper local transformation function gQ. For each of them 



we illustrate a possible application for collaborative network 
monitoring. This section is one of the main contributions of 
the paper: to the best of our knowledge we are the first to 
"interpret" the following operations as applications of SMC 
using the additive sharing scheme. 

5.1 Multiplication 

Multiplication of positive integers can be accomplished 
directly by the multiplicative version of GCR presented in 
§2.3| Alternatively, the multiplication of positive real num- 
bers B = Yli^i (fo^ > 0) can be mapped to a summation 
in the logarithmic domain. Each player locally computes 
ai = log^ bi and then the computation proceeds as a simple 
summation of real numbers, leading to A = ai. Finally, 
the result is computed as 5 = c^. Some numerical issues 
might arise when the product involves a large number of 
non-unitary terms, due to the accumulation of rounding er- 
rors in the representation of the logarithmic values — these 
however are well studied problems. 

5.2 Set Operations 

In this section, we first describe how (probabilistic) set 
operations can be implemented using bloom filters with any 
SMC scheme that supports both, private additions and mul- 
tiplications (e.g., SSS). We then outline what subpart of that 
functionality can easily be implemented with GCR. 

Bloom filters (BF) are powerful data structures for rep- 
resenting sets 5 . A bloom filter for representing a set 
S = {xi, X2, . . . , Xn} of n elements is described by an array 
of m bits, initially all set to 0. The BF uses k independent 
hash functions hi, . . . ,hk with range 1, . . . , m. For each el- 
ement X G 5*, the bits hi{x) are set to 1 for 1 < i < /c. For 
checking whether an element ^ is a member of 5*, we simply 
check whether all bits hi{y) are set to 1. As long as the 
BF is not saturated, i.e., m is chosen sufficiently large to 
represent all elements, the total number of non-zero buckets 
allows to accurately estimate \S\. Counting Bloom Filters 
(CBF) are a generalization of BFs, which use integer arrays 
instead of bit arrays. Thus, CBFs allow to represent mul- 
tisets, in which each element can be represented more than 
once. Note, that while a (C)BF allows to efficiently check for 
element membership, it can not be used to enumerate the 
contained elements, in general. Compared to state-of-the- 
art approaches for privacy-preserving set operations, which 
use homomorphic encryption (e.g., fU), this allows for very 
efficient and scalable solutions. 

Set Union. 

If each player i has a local set Si, they can construct the 
union of their sets S = SiU S2U, . . . , USn by performing pri- 
vate OR (V) over their BF arrays. If inputs are multisets, 
represented by CBFs, the aggregation operation is addition 
instead of OR. Using CBFs, each player can learn the num- 
ber of occurrences of specific elements across all players or 
the number of other players that report each element (by 
using a BF as input). From the aggregate CBF, one could, 
for instance, compute the entropy of the empirical element 
distribution. 

Set Intersection. 

In order to perform set intersection on BFs, the players 
simply use the AND (A) operation for aggregating their sets 
S = SinS2r\, . . .nSn- Only buckets set to 1 in all the players' 



BFs will evaluate to 1 in the aggregate BF. In this specific 
scenario, it is also possible for each player i to enumerate 
all elements in S simply by iterating over all x ^ Si and 
checking whether x ^ S, since S C Si. 

Set Operations with GCR. 

GCR directly supports the addition operation and there- 
fore set union on multisets. If the counts in each bucket 
are not sensitive, the union and intersection of sets can be 
computed from the public union of multisets — the inter- 
section, for instance, is given by selecting all elements with 
count n. However, private union and intersection directly 
on sets can not be delivered by GCR. In fact, union requires 
OR, i.e., a combination of addition and multiplicatiorjj not 
supported by GCR, while the problem with intersection is 
that multiplicative GCR does not include (see ^2.3). 



5.3 Anonymous publishing 

The goal is to let one player Pi publish to all other players 
a binary string w without revealing its identity. The string 
w can be, for example, a malware pay load that Pi has dis- 
covered with an IDS, or the description of an attack which 
was observed locally. Moreover, w could be used as a public 
condition for a future Conditional Counting round ( ^4.2[ ), 
e.g., to discover how many other players have observed the 
same event. There are several reasons why the publisher 
wants to remain anonymous. First, knowing that it was 
hit by the malware might be detrimental to its reputation 
among customers. Second, such information might benefit 
other potential attackers. 

DC-nets 9 are a basic and unconditionally secure solution 
for anonymous publishing. In the following, we devise an 
alternative solution that does not require pair-wise shared 
secrets, and deals with the problem of i) detecting collisions 
and ii) scheduling the publication process to avoid collisions. 

Let k denote the length of string w, and denote by C{w) 
a Cyclic Redundancy Check (CRC) control field of length 
c computed on w — the need for CRC is explained be- 
low. It is straightforward to map an Anonymous Publishing 
round to a bit-wise summation on strings of length k -\- c. 
The publisher Pi sets its data element to the concatena- 
tion of w and C{w), i.e., ai =< w,C{w) >, while all other 
players set their data elements to null [aj — 0, j ^ 1). 
Therefore the public result will return the string w in clear- 
text, i.e., A — ai —< w,C{w) >, but since the individ- 
ual data elements remain unknown the identity of the pub- 
lisher can not be reconstructed. Such a simple approach 
works only if exactly one player attempts to publish in the 
computation round: if two (or more) players Pi and P2 at- 
tempt to publish different strings, we have a collision — 
i.e., the computed result will be the combination A =< 
wi W2^C(wi) C(w2) > ('0' for bit-wise summation) 
from which neither of the elements wi,W2 can be derived. 
However the collision can be easily revealed by CRC failure 
as in general C{wi -\- W2) 7^ C{wi) C(w2). The "colli- 
sion recovery" procedure can simply foresee the repetition 
of new anonymous publishing rounds associated to a back- 
off scheme to avoid that the same players collide again in 
the next round — a mechanism conceptually equivalent to 
Slotted- Aloha. 



^Note that with a, b being bits, a V b = a -\- b — 2ab and 
a /\b — ab. 



A simple "detection and recovery" approach is not effective 
when the instantaneous rate of pubhshing attempts is high 
— this is of particular concern in large-scale system with 
many players (n >> 1) and/or in presence of correlated at- 
tempts (e.g., a spreading malware payload caught simulta- 
neously by different domains). In such cases it is preferable 
to adopt a "collision prevention" method by orderly schedul- 
ing the publishing rounds for different players. This can 
be achieved by a single round of anonymous scheduling, as 
explained below. 

5.4 Anonymous scheduling 

The problem is defined as follows. Out of the total n play- 
ers, a subset of m < n "active" players are ready to perform 
a given action, e.g., anonymous publishing. The problem is 
then to schedule the m active players without knowing nor 
revealing their identities. This apparently difficult task can 
be easily accomplished by bit-wise summation over strings 
of size k » m. At the query round, the inactive players 
set their data elements to the null string, while each active 
player Pi extracts uniformly a random integer ^^ ^ U{l,k) 
and then builds its data element ai with a single '1' at the 
Qi-ih position and all other bits set to '0'. The bitmap length 
k must be set large enough to ensure that bit-collision prob- 
ability — i.e., two or more players independently picking the 
same random value qi — is kept acceptably low. 

Assuming that no bit-collision has occurred, the final (pub- 
lic) result A is a bitmap with m 'I's and k — m 'O's. Upon 
learning A, each active player Pi checks whether the bit in 
the qi position is set to '1', and if so it counts the num- 
ber of 'I's in the preceding positions, say /ii, from which he 
learns it has been scheduled in the successive {fn + 1)— th 
query round. If otherwise the ^^-th bit is '0', Pi infers that 
a collision has occurred and waits for the next scheduling 
round. 

Note that in case of bit-collisions the round does not com- 
pletely fail: if collisions involves only two (or any even num- 
ber of) players, the colliding players will simply wait for 
the next scheduling query. If three (or any odd number of) 
players have collided on the same ^— th bit, they would again 
collide in the q—ih query round. However this is not a se- 
rious problem as far as collisions in the query rounds can 
be detected and recovered (e.g., by CRC failure in case of 
Anonymous Publishing). 

The number of active players m is relevant to the set- 
ting of the bitmap length k {k » m). One conservative 
approach is to simply assume the worst case m — n. Alter- 



natively, a preliminary Conditional Counting query (^.2) 
might be launched to discover the exact value of m. The 
latter approach has also another advantage: with knowledge 
of m, the occurrence of bit-collisions can be easily revealed 
by comparing to the number of 'I's in the final result, i.e., 
\A\i. In fact, the difference m — \A\i equals to the number 
of colliding players. For example, m — \A\i = 1 implies that 
only a two-player collision has occurred, and the master can 
decide to validate the current scheduling round — implic- 
itly deferring the two colliding players to a future scheduling 
round — or to invalidate it and immediately re-launch a new 
scheduling round. 

6. RELATED WORKS 

SMC is a cryptographic framework introduced by Yao [22] 
and later generalized by Goldreich et al. [13]. SMC tech- 



niques have been widely used in the data mining community. 
For a comprehensive survey, please refer to [l]. Roughan 
et al. [19] first proposed the use of SMC techniques for a 
number of applications relating to traffic measurements, in- 
cluding the estimation of global traffic volume and perfor- 
mance measurements 18 . In addition, the authors identi- 
fied that SMC techniques can be combined with commonly- 
used traffic analysis methods and tools, such as time-series 
algorithms [2] and sketch data structures. 

However, for many years, SMC-based solutions have mainly 
been of theoretical interest due to impractical resource re- 
quirements. Only recently, generic SMC frameworks op- 
timized for efficient processing of voluminous input data 
have been developed [4j[8]. Today, it is possible to pro- 
cess hundreds of thousands of elements distributed across 
dozens of networks within few minutes, for instance to gen- 
erate distributed top-k reports 6 . While these results are 
compelling, they stick to the completely secret evaluation 
scheme. Our work aims at boosting scalability even further 
by relaxing the secrecy constraint for intermediate results. 
As such, our approach can be applied only in cases where the 
disclosure of intermediate results is not regarded as critical 
— a quite frequent case in practical applications. Moreover, 
we aim at optimizing the sharing scheme for fast computa- 
tion in the online phase. 

When it comes to analyzing traffic data across multiple 
networks, various anonymization techniques have been pro- 
posed for obscuring sensitive local information (e.g., [21j). 
However, these methods are generally not lossless and intro- 
duce a delicate privacy-utility tradeoff [17|. Moreover, the 
capability of anonymization to protect privacy has recently 
been called in question, both from a technical 7 and a legal 
perspective H^. 

7. CONCLUSIONS 

The use of SMC techniques has recently been proposed 
to overcome the inhibiting privacy concerns associated with 
inter-domain sharing of network traffic data. However, the 
cost at which the cryptographic privacy guarantees of SMC 
are bought is tremendous. Although the design and im- 
plementation of basic SMC primitives have recently been 
optimized, processing time for queries is still in the order 
of several minutes and involves significant communication 
overhead. 

In this paper, we further boost the performance of privacy- 
preserving network monitoring by two means. Firstly, we 
identify that perfect secrecy of intermediate results is not 
required in many cases. That is, we advocate the use of 
"elementary" (as opposite to "complete") secure multiparty 
computation (E-SMC) procedures for traffic monitoring. E- 
SMC supports only simple computations with private in- 
put and public output^ i.e., they can not handle secret input 
nor secret (intermediate) output. Secondly, we separate the 
computation into an offline and an online phase. Our pro- 
posed scheme GCR is based on additive secret sharing and 
pre-generates random secret shares during the offiine phase 
with only constant storage overhead. In the online phase, 
GCR allows to process actual queries with zero communi- 
cation overhead. This enables adoption of SMC techniques 
on massive scales, both in terms of input data volume and 
number of participants. In the second part, we introduce 
a number of high-level primitives supported by GCR that 
cover a wide range of use cases in network monitoring, in- 



eluding the private generation of histograms, set operations, 
and anonymous pubhshing. 

In future work, we wih evaluate GCR on real network se- 
tups and study hybrid approaches combining GCR with SSS 
to provide scalability and functional completeness. 
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